perf[buffer]: iteration for fallible operations with validity#8120
Conversation
Merging this PR will not alter performance
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | chunked_dict_primitive_into_canonical[u32, (1000, 10, 10)] |
120.7 µs | 183 µs | -34.05% |
| ❌ | Simulation | encode_varbin[(1000, 2)] |
176.1 µs | 237.1 µs | -25.74% |
| ❌ | Simulation | take_10k_random |
196.9 µs | 255.6 µs | -22.97% |
| ❌ | Simulation | patched_take_10k_contiguous_patches |
230.6 µs | 290.8 µs | -20.69% |
| ❌ | Simulation | patched_take_10k_random |
242.6 µs | 302.8 µs | -19.89% |
| ❌ | Simulation | chunked_varbinview_canonical_into[(1000, 10)] |
161.8 µs | 198 µs | -18.28% |
| ❌ | Simulation | chunked_varbinview_into_canonical[(1000, 10)] |
177.1 µs | 213.5 µs | -17.06% |
| ❌ | Simulation | bench_many_codes_few_values[1024] |
393.2 µs | 465.6 µs | -15.55% |
| ❌ | Simulation | decompress_rd[f64, (100000, 0.0)] |
845.5 µs | 982.8 µs | -13.97% |
| ❌ | Simulation | varbinview_large |
112.2 µs | 130.4 µs | -13.97% |
| ❌ | Simulation | chunked_varbinview_canonical_into[(100, 100)] |
273.8 µs | 307.9 µs | -11.08% |
| ❌ | Simulation | chunked_varbinview_into_canonical[(100, 100)] |
326.4 µs | 365 µs | -10.58% |
| ⚡ | Simulation | sum_i32_nullable_all_valid |
69.2 µs | 35.4 µs | +95.64% |
| ⚡ | Simulation | null_count_run_end[(10000, 4, 0.01)] |
125.4 µs | 91.6 µs | +36.98% |
| ⚡ | Simulation | encode_varbinview[(1000, 2)] |
189 µs | 157.1 µs | +20.26% |
| ⚡ | Simulation | chunked_varbinview_opt_into_canonical[(1000, 10)] |
229.3 µs | 192.7 µs | +18.96% |
| ⚡ | Simulation | and_bool_nullable |
93.7 µs | 82.6 µs | +13.41% |
| ⚡ | Simulation | baseline_lt[4, 1024] |
78.5 µs | 69.9 µs | +12.23% |
| ⚡ | Simulation | decompress_rd[f64, (100000, 0.01)] |
981.2 µs | 890.4 µs | +10.2% |
| ⚡ | Simulation | decompress_rd[f64, (100000, 0.1)] |
981.2 µs | 890.4 µs | +10.19% |
| ... | ... | ... | ... | ... | ... |
ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing ji/fast-iter-valid (7ed76bc) with develop (679e2c5)2
Footnotes
-
11 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
-
No successful run was found on
develop(9f494a1) during the generation of this report, so 679e2c5 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report. ↩
4b444dd to
72bca8b
Compare
|
Open question is where to put this code? |
|
Sounds like we want a crate in between the array and vortex-buffer or this could be a feature flag in vortex-buffer |
|
I was thinking I cannot remember if this was a problem before? |
|
there was no problem with vortex-compute iirc. There were some constructs that made it hard in the past but I think we're fine now. |
|
This PR has been marked as stale because it has been open for 14 days with no activity. Please comment or remove the stale label if you wish to keep it active, otherwise it will be closed in 7 days |
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
| /// The kernels in this crate require this trait instead of `Iterator` so that lane | ||
| /// reads carry no inter-iteration data dependency — the autovectorizer treats each | ||
| /// lane independently. | ||
| pub trait IndexedSource { |
There was a problem hiding this comment.
These traits really look like Buf/BufMut from bytes which we already implement for buffers and are implemented for similar things
There was a problem hiding this comment.
I think the random access read/write
| /// | ||
| /// Use this to drive a binary kernel from two columns. Length equality is enforced | ||
| /// at construction. | ||
| pub struct LaneZip<A, B>(pub A, pub B); |
There was a problem hiding this comment.
Seems like a good candidate for implementing Iterator and ExactSizeIterator
| /// `impl<S: IndexedSource> IndexedSourceExt for S` below. Bring the trait into | ||
| /// scope (`use vortex_compute::lane_kernels::IndexedSourceExt;`) to call | ||
| /// them with method syntax: `values.try_map_masked_into(&mask, &mut out, f)`. | ||
| pub trait IndexedSourceExt: IndexedSource + Sized { |
There was a problem hiding this comment.
Can we skip the ext traits? just increases the mental load of tracking where things happen IMO
There was a problem hiding this comment.
What's another way of doing this?
There was a problem hiding this comment.
why can't it be default functions on the core trait?
There was a problem hiding this comment.
I really don't think its worth having default functions here we likely want more of these
There was a problem hiding this comment.
Do this really matter?
| // Skip the fallible kernel when type widening or (cached) min/max prove every value fits. | ||
| let target_dtype = DType::Primitive(T::PTYPE, Nullability::NonNullable); | ||
| let infallible = casts_losslessly_to(F::PTYPE, T::PTYPE) | ||
| || cached_values_fit_in(array, &target_dtype) == Some(true); |
| Mask::AllTrue(_) => BufferMut::try_from_trusted_len_iter( | ||
|
|
||
| // Returns `true` if every value of `from` is representable in `to` without loss. | ||
| fn casts_losslessly_to(from: PType, to: PType) -> bool { |
There was a problem hiding this comment.
doesn't need to be a function
There was a problem hiding this comment.
I prefer this only that the body does easily read like that?
| @@ -0,0 +1,15 @@ | |||
| // SPDX-License-Identifier: Apache-2.0 | |||
There was a problem hiding this comment.
Should this be in vortex-buffer?
There was a problem hiding this comment.
I did think about putting this there, but it just seemed like the wrong place
|
we need to add the crate to the benchmarks, since it's a new one it has to be enumerate in the ci matrix |
Currently use (and arrow) handle fallible operations with scalar (non-SIMD) code.
This PR add a trait and methods to have fast SIMD checked operations (includes cast) but verified else where that
checked_addbenefits